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Abstract 

We consider the design of experiments to evaluate treatments that are administered 
by self-interested agents, each seeking to achieve the highest evaluation and win the 
experiment. For example, in an advertising experiment, a company wishes to evaluate 
two marketing agents in terms of their efficacy in viral marketing, and assign a contract 
to the winner agent. Contrary to traditional experimental design, this problem has 
two new implications. First, the experiment induces a game among agents, where each 
agent can select from multiple versions of the treatment it administers. Second, the 
action of one agent - selection of treatment version - may affect the actions of another 
agent, with the resulting strategic interference complicating the evaluation of agents. 
An incentive-compatible experiment design is one with an equilibrium where each agent 
selects its natural action , which is the action that would maximize the performance of 
the agent if there was no competition (e.g., expected number of conversions if agent 
was assigned the contract). 

Under a general formulation of experimental design, we identify sufficient conditions 
that guarantee incentive-compatible experiments. These conditions rely on the exis¬ 
tence of statistics that can estimate how agents would perform without competition, 
and their use in constructing score functions to evaluate the agents. In the setting with 
no strategic interference, we also study the power of the design, i.e., the probability 
that the best agent wins, and show how to improve the power of incentive-compatible 
designs. From the technical side, our theory uses a range of statistical methods such 
as hypothesis testing, variance-stabilizing transformations and the Delta method, all 
of which rely on asymptotics. 


1 Introduction 

Experiments are the gold-standard for evaluating the effects of different treatments. The 
design of experiments is crucial in order to avoid systematic biases and to minimize random 
errors in the statistical evaluation of treatment effects [6]. There are three fundamental 
concepts in any experiment design. The treatment is a well-defined prescription or set of 
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rules, e.g., a pharmaceutical drug, a marketing campaign, or a new material. The goal of 
the experiment is to evaluate the effects of different treatments. The experimental unit is 
the indivisible entity that will receive a treatment within the experiment, e.g., a patient, 
a potential customer, or a factory process. Typically, every unit receives only one treat¬ 
ment, but there are important exceptions as well. The treatment is assigned according to a 
treatment assignment rule specified by the design and necessarily involves randomization in 
order to avoid systematic biases. When a unit receives the treatment it exhibits a measurable 
outcome, e.g., a health assessment, a product purchase or not, or a material failure rate. 

Statistical analysis of unit outcomes is necessary for the evaluation of treatments because 
it accounts for the errors that are inherent to randomization of treatment and the measure¬ 
ment process. A key idea in experimental design is blocking. Background information on 
units is almost always available, e.g., age, gender, socioeconomic status, health status, and 
so on. If an experimenter believes that units’ outcomes vary systematically with respect to 
such covariate information, then it is necessary to block units with respect to the available 
covariates. Blocking helps to avoid systematic bias and variability that is not of scientific 
interest. The unofficial mantra in experimental design is “block what you can and randomize 
what you cannot” Box et. al. HD- 

To illustrate, consider the example of a new flu shot. A pharmaceutical company, the 
experimenter, wants to compare between the new flu shot and a baseline that is currently in 
the market. The treatments are the two flu shots. The experimenter has a set of volunteer 
patients who form the set of experimental units. When a unit receives a treatment the 
outcome is whether the unit got flu or not for the three months following the treatment. As 
a treatment assignment rule, the experimenter could simply give the new flu shot to half of 
the patients at random, and give the baseline to the other half. However, the outcomes could 
be confounded with factors such as age (older people are more vulnerable to flu), geography 
(urban areas are more crowded and possibly more contagious), occupation, and so on. In a 
blocking design, the experimenter could block the population based on age and occupation, 
and perform the randomization within blocks. 

There are two crucial assumptions in experimental design and the related topic of causal 
inference, collectively known as the stable unit treatment value assumption (SUTVA) [TO] . 
First, there are no hidden versions of a treatment. In the previous example, this means that 
there are no strong or weak versions of the new flu shot. Otherwise, the outcomes would 
be confounded with the hidden version of the treatment. This is an important problem, 
especially in social science studies. For example, in an educational study a new treatment 
could be a new type of curriculum, however a possible hidden version of the treatment is 
the delivery method by each teacher. A second crucial assumption is that of no interference 
among experimental units. Interference is present when the treatment assignment on one 
unit affects the outcome of another unit. In the flu shot example, a unit that is not vaccinated 
is still protected when the friends of the unit are vaccinated. Neither of these assumptions 
hold in our setting. 

We introduce the idea of incentive-compatible experimental design in the context of viral 
marketing!)] Imagine a company that designs a test to determine which of two vendors has 


1 An early extended abstract of this paper was presented in the Conference on Digital Experimentation 
at MIT Toulis et. al. [13]. 
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the best algorithm for running an advertising campaign. The firm uses randomization to 
prevent systematic bias, and defines a criterion for success; e.g., the number of conversions 
over a two week period. The winning vendor is promised a one-year contract with the firm 
running the test. One challenge in this setting is that the vendors might deviate from how 
they would normally run a campaign, trying to win the test. For example, a lower quality 
vendor may try to follow a more aggressive strategy, hoping to get lucky. This is a problem for 
the firm designing the test, who wants to get an unbiased estimate of the usual performance 
of the vendor. Another challenge conies from interference between the participants. In viral 
marketing, for example, one vendor may try to free-ride on word-of-mouth effects that come 
from another vendor. 

1.1 Results 

A first contribution of the present paper is to formalize this problem of incentive-compatible 
experimental design. The difference with traditional experimental design is that, in our 
framework, strategic agents administer the treatments to be evaluated, and each agent can se¬ 
lect from multiple treatment versions. In this way, the experiment induces a non-cooperative 
game. The action available to an agent in the resulting treatment selection game is the ver¬ 
sion of the treatment that the agent will administer to its assigned units. The experimenter 
has a performance metric to evaluate each treatment version. This is the quantity of interest 
to the experimenter. Each agent has a natural action, which is the action that maximizes 
its performance, and is assumed to be the way the agent would act if not competing in the 
game. The quality of an agent is the maximum value of the performance metric, achieved 
when the agent plays the natural action without competition from other agents. The goal 
of the experimenter is to design an experiment to estimate the agent of highest quality. An 
incentive-compatible experiment design is one with an equilibrium in which each agent’s best 
response is to select the treatment version corresponding to its natural action. We will focus 
on dominant-strategy equilibrium in this paper. 

We show that incentive-compatible designs are possible when an identifying statistic ex¬ 
ists that can estimate the quality difference between agents (Theorem ITT]). Critically, the 
variance of such a statistic has to be less sensitive to agent actions than its expected value, 
otherwise an agent can take advantage of the variance of the statistic. Under a no inter¬ 
ference assumption, a class of incentive-compatible designs can be constructed through a 
variance-stabilizing transformation f Theorem 14.1|) . which makes the variance of the identify¬ 
ing statistic insensitive to agent actions; a worse agent cannot hope to increase its chances by 
being more aggressive. This leads to results that may sound counter-intuitive. For example, 
in a viral marketing application where performance is the expected number of conversions, 
and where higher expected conversions also correspond to increasingly higher risks, it is not 
incentive-compatible to select as the winner the agent with the highest average performance; 
rather, it is incentive-compatible to select as the winner the agent with the lowest reciprocal 
of average performance (see Example 2(d)). 

Identifying statistics and incentive-compatible designs are generally harder to obtain 
under strategic interference. However, under specific modeling assumptions about the inter¬ 
ference, better designs can yield more information about the agent performances, and thus 
produce identifying statistics. We illustrate this idea in a viral marketing example, which 
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we reuse throughout this paper. 


2 Preliminaries 

In this section we introduce notation for the operational and statistical components of 
incentive-compatible experimental design. The operational components include the treat¬ 
ment assignment, the treatment selection game and the experiment outcomes. The statisti¬ 
cal components include the estimand- the quantity of interest to the experimenter -and the 
estimators, i.e., the data statistics used to estimate the estimand. 

2.1 Treatment assignment 

Let U = { 1,2,..., m} denote the set of experimental units , indexed by u, and X — {1, 2,..., n} 
denote the set of agents, indexed by i. Each agent, for example, a marketing firm or a drug 
company, represents a treatment to be evaluated. An experimenter needs to design the ex¬ 
periment that will evaluate the agents. Relative to traditional experimental design, the new 
aspect is that each agent is associated with a set of treatment versions and each agent has a 
strategic choice about which version to administer in the experiment. We make this precise 
in Section [2721 

For each unit u Eld there is covariate information that is common knowledge to agents 
and the experimenter. We assume the experimenter uses covariates to split units into blocks, 
such that units within one block are similar in terms of covariates, e.g., similar age, gender, 
income, etc. Without loss of generality, we will assume there is just a single block. In 
Appendix [A] of this paper, we discuss how the theory can be extended to multiple blocks. 

A treatment assignment rule ^ assigns each unit to a single agent. Let Z = ( Z u ) denote 
the m X 1 assignment vector, such that Z u = i indicates that unit u is assigned to agent i. 
The assignment rule ^ is a probability distribution over all possible assignments Z. Without 
loss of generality, we assume that the number of units m is a multiple of the number of agents 
n. We will also assume complete randomization, such that Z u = i, for exactly k = f m/n 
units, for each agent i. 

2.2 Treatment selection game 

The set of actions Ai C A denotes the feasible action space for agent i, where A is the set 
of all possible actions. Subsequent to treatment assignment, every agent i simultaneously 
selects an action Ai E Ai, which corresponds to a version of the treatment administered by 
agent i. The same version is applied to all units assigned to agent Let A = (Ai,... A n ) 
denote the joint action profile, and A_j = (Ai,..., A;_i, A i+ i,..., A n ) denote the action 
profile without i’s action. 

We refer to this stage of the process as the treatment selection game in order to em¬ 
phasize that agents (i.e., the treatments) can be strategic in selecting the treatment version 

2 In Appendix[A] we introduce multiple blocks and allow an agent to pick a different action for each block. 
All units within a block receive the same treatment version, but versions might differ across blocks. 
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they administer to units. This differentiates our setting from traditional experimental de¬ 
sign, because it allows multiple versions of the same treatment to be available, hidden to 
the experimenter, and subject to selection by strategic agents. The traditional setting of 
experimental design is recovered if all action spaces of all agents are singletons, i.e., there is 
only one treatment version for each agent |f| 


2.3 Outcomes 


Subsequent to the treatment selection game, an outcome is measured on each experimental 
unit u. Generally, the potential outcome of unit u, denoted by Y u ( Z, A), is the outcome that 
will be observed under assignment Z and agent actions A. We assume that outcomes are 
numerical values; e.g., expenditure in dollars, number of product purchases, etc. 

However, only one potential outcome can be observed at any given experiment, depending 
on the realized assignment Z and actions A, while the rest will be missing. To emphasize the 
difference between potential outcomes and observed outcomes , we use additional notation. 
Let l^° bs denote the observed outcome on unit u that was assigned to agent i. The notation 
y u ° bs implies that u was assigned to i (i.e., Z u = i ), and it is undefined if Z u ^ ^ i. e ., u 
was not assigned to i. Following a “dot-notation,” K° bs denotes the k x 1 vector of observed 
outcomes of units assigned to agent i, and K obs denotes the mxl vector of observed outcomes 
of all units. 

Note the dependence of potential outcomes on the complete assignment vector Z; this 
allows the outcome of unit u to depend on assignment Z u i of some other unit w', even 
when agent actions A are held fixed. This situation is reasonable, for example, when units 
form social networks and influence each other, and is generally known as social network 
interference Toulis and Kao HU- In our setting, interference between units affects the 
actions agents take (treatment versions), which then affect the interference on units, and so 
on. We collectively refer to this situation as strategic interference 0 

We now illustrate the notation with an example application in viral marketing, which we 
will reuse throughout this paper. 


Example 1 . Assume four units U = {1,2, 3,4} in a single block, say, undergraduate 
students, and two marketing agents X = {1,2}. Further assume that 1 and 2 are close 
friends and 3 and 4 are close friends. The experimenter wants to understand which agent 
is better at advertising to students. Assume a treatment assignment Z = (1,2,1,2) T , i.e., 


‘^Dealing with multiple hidden treatments remains an open problem in traditional experimental design 
and causal inference, although not in a game theoretic setting as ours, and it is typically assumed away, for 
example, through SUTVA [ID] . 

4 There exists work in experimental design with between-unit interference David and Kempton [8], al¬ 
though not under a strategic interference setting as ours. In this paper, we will not be concerned with such 
forms of interference, but it will be the focus of future work. There is also related work in estimation of 
treatment effects in the context of strategic agents. For example, Athey et. al. [T] and Toulis and Parkes 
m evaluate mechanisms in terms of their revenue, under the causal framework of potential outcomes. In 
both papers, the treatments are two different mechanism formats, and the units are the agents competing in 
the mechanism. The present work differs because, under our framework, the treatments are in fact strategic 
agents that are evaluated through an experiment, whereas the units passively exhibit treatment outcomes. 
See, also, the discussion by Dash [7] on the challenges of causal inference in dynamical systems within a 
different causal framework, namely causal graphs [9|. 
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units 1,3 are assigned to agent 1, and units 2,4 to agent 2. Each agent has two actions 
(treatment versions): advertise through phone or through social media. The action sets are 
thus A\ = A 2 = {phone, social}, and a possible action profile is A = (phone, social) 1 with 
Ai = phone (agent 1 uses phone to reach units 1 and 3) and A 2 = social (agent 2 uses social 
media to reach units 2 and 4.) 

The potential outcome Y u ( Z, A) could denote the number of product purchases (integer 
outcome) made by unit u, or the net profit from advertising to unit u (continuous outcome). 
Dependence on the assignment and treatment versions of both agents is reasonable because 
there could be word-of-mouth effects between students. 

Consider observed data Y obs = (0,1,4,1) T ; for example, Kj° bs = 4, which indicates that 
unit 3 was assigned to agent 1 and purchased four product items; is undefined because 
the outcome of unit 3 when assigned to agent 2 is not observed. To illustrate the dot-notation, 
= (0,4) T indicates the outcomes of units assigned to agent 1, and = (1,1) T indicates 
the outcomes for agent 2. 

In Example 1, t he ex perimenter might be tempted to decl are a gent 1 as the winner, 
because it achieves Y° bs = 2.0 pur chases/unit, as opposed to Y% hs = 1.0 pur chases/unit 
for agent 2. However, these sample averages are subject to random variability from the 
randomization in the experiment, and may result from actions that are not the natural 
actions of the agents. Therefore, it is unclear whether the sample averages actually estimate 
how agents would do if they were selecting treatments without competition. 

2.4 Estimand and estimators 

A principled approach is to define the quantity of interest to the experimenter, the estimand , 
and then devise appropriate estimators for that quantity. The estimand is the agent with best 
possible performance, and thus we need a concrete notion of performance. For this, we want 
to estimate how good an agent’s action would be if it was played without competition and 
thus without strategic interference. This is important because, ultimately, the experimenter 
wants to assign a contract (e.g., an advertising campaign) to the winner agent, after which 
the winner will act by itself. 

Let’s define the performance of agent i with respect to its action cq, denoted by y(cq), as 

X(ai) = E(Y U (Z, A)| A = a { l, Z u = i ); (1) 

notation A = cql denotes the hypothetical situation where all agents other than agent i are 
replaced by “replicates” of i, and each replicate plays action cq. The dependence of y(cq) 
on agent index i will be implicit in the notation. Given assignment vector Z and actions A, 
we assume that the distribution of potential outcomes is known to all agents. 

The expectation in Eq. (Jl]) is taken with respect to this distribution, and defines the 
quantity of interest to the experimenter because it captures how agent i would do, on average, 
if the agent was acting alone without competition^! We also refer to x as the performance 
function, and define x(A) = (y/Ai), y(A 2 ),..., y(A n )) T . For brevity, all following definitions 

5 In causal inference, Eq. CD is a superpopulation estimand , where the experimental units are assumed 
to be a random sample from a superpopulation of units, which is the target of statistical inference. The 
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for an agent, e.g., natural action, quality, etc., will be implicitly assumed to be stated with 
respect to a particular performance function y. 

The natural action of agent i is the action that maximizes the quantity of interest to 
the experimenter in a system where agent i acts alone without competition. In particular, 
the natural action of agent i, denoted by A*, is defined as the action that maximizes its 
performance, i.e., 


A* = arg max {y(a0} • 

otieAi 


( 2 ) 


The natural action profile is denoted by A* = (A*, A£,..., A*). The quality of agent i, 
denoted by y* G M, is the maximum performance that the agent can achieve, i.e., y* = y(A*). 
The estimand , denoted by r, is the agent of highest quality, i.e., 


r = arg max{y*} 

2 Ex 


(3) 


To estimate the agent of highest quality the experimenter needs to use the observed 
outcomes y obs . We will assume that the experimenter uses a score function 0 : R m —y M n , 
mapping all outcomes to a n x 1 vector of scores for each agent, denoted by 0* for agent i. 
For convenience, we will write 0>(F obs ) = (0i(Y0 bs ), 02 (X° bs ), • • •, 0 n (0 obs )) T . 

In the experiment, agents will be evaluated according to their scores, and the winner is 
the agent with the highes t score. Several options for the score functions are possible. For 
example, 00Y0 bs ) = F° bs , the sample mean of outcomes of units assigned to agent i , is one 
choice for the score function; other choices are possible, e.g., the sample Sharpe ratio, the 
sample median, etc. 

The key challenge in incentive-compatible experimental design is to align maximizing 
the probability of winning the experiment, as induced in part by the score function 0, with 
selecting the action with maximum performance, i.e., the natural action. 

2.5 Incentive-compatible experiment designs 

Let’s first define an experiment design using the concepts of estimand and estimators from 
Section 12.41 

Definition 2.1. An experiment design T> = (0,0) operates in the following steps: 

1. Receives units U and agents X, as input. 

2. Samples a treatment assignment Z according to 0. 

3. Each agent i picks a treatment version A i; and administers the treatment to the set of 
its assigned units, {u G U : Z u = i}. 

expectation in Eq. m is thus over all units in the superpopulation and all treatment assignments, for fixed 
agent actions. Other estimands in that superpopulation are possible; for example, the experimenter might 
be interested in the median outcomes, med(F„(Z, A)), or the Sharpe ratio, E(Y U (Z, A))/SD(Y U (Z, A)), all 
conditional on fixed actions as in Eq. ©■ In this paper, we work under the estimand of Eq. Q, mainly for 
simplicity, however our theory applies to all aforementioned estimands as well. 
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4- Outcomes on units Y' obs are observed. 

5. The winner agent t is declared according to the rule 

f(Y ohs ) = argmax {</>*(Y' obs )} . (4) 

i£l L ■■ J 


Given experiment design T> and action profile A, the probability P*( A\T>) that agent i 
wins the experiment is given by: 

Pr (r(y° bs ) = i\A,V) =Pi(A\V) = Pfa, A^\V). (5) 

The randomness in Eq. (j5j) comes from the randomness of observed data E obs , and the 
randomization in the treatment assignment. The winning probability P$(- \T>) in Eq. (J5J) is 
the expected utility of agent i under action profile A, because agents care only about winning 
the experiment. 

Definition 2.2 (Incentive-compatible experiment design). An experiment design V = ("0,0) 
is incentive-compatible if the natural action A* is a dominant strategy for each agent i, i.e., 
it maximizes the probability (J5J) of winning the experiment regardless of other agents’ actions, 
such that 

arg max{Pj(cq, A_j|P)} = A*, (6) 

for all actions A^i, and every agent i. 

Remark. In an incentive-compatible experiment, the score function 0 induces a probabil¬ 
ity of winning (J5]) that is monotonically increasing with the performance function x that the 
experimenter cares about. If this monotonicity holds, an agent will prefer to play the action 
that maximizes its performance (i.e., the natural action), because this will also maximize 
the winning probability. 

The notation is summarized in Table [0 We now return to the viral marketing prob¬ 
lem that was introduced in Example 1. Examples 2(a)-(c) deal with Normally-distributed 
outcomes, whereas Examples 3(a)-(g) deal with Poisson-distributed outcomes. Examples 
3(c)-(g) deal specifically with the problem of interference, and work with a more realistic 
form of the viral marketing problem. 

Example 2(a). — Normal outcome^]. Consider the viral marketing problem of Example 
1, with multiple units and two agents, where the outcomes of interest are the profit achieved 
from advertising to each unit. We assume that an agent action cq = (/i;,of) G Rx M + , 
determines the mean and variance of the profit from advertising to unit u, such that, given 
assignment Z, actions A, 

Y u ( Z, A) ~ of), if At = aii, Z u = i. (7) 

6 This two-agent example (low-quality agent vs. high-quality agent) is different from the example in the 
original paper published at EC’2015. The example was edited to illustrate a scenario where the low-quality 
agent prefers to play an action that is not its natural action and also reduces the winning chances of the 
high-quality agent. In the example of the original paper, the deviation from the low-quality agent actually 
increased the chances of the high-quality agent. 



Table 1: Notation for incentive-compatible experimental design 


Symbol 

Description 

Value/Domain 

U 

Set of m units 

{1,2,... ,m} 

X 

Set of n agents 

{1,2 ,...,n} 

Z u 

Treatment assignment of unit u 

Z u £ X 

z 

Vector of treatment assignment (m x 1) 

(Zi ,..., Z m ) J 

k 

A 

Units per agent 

Generic action space 

k = m/n 

Ai 

Action space of agent i 

Ai C A 

Ai 

Action of agent i 

Ai £ Ai 

A 

Complete action profile (n x 1) 

{A \,..., A n y 

Y u ( Z,A) 

■y^obs 

1 ui 

Potential outcome of unit u under assignment Z, actions A 
Observed outcome for unit u assigned to agent i 

Y U (Z, A) € M 

■y^obs 

1 A 

Vector of observed outcomes of units assigned to agent i (k x 1) 

yobs € R k 

■y^obs 

Vector of observed outcomes of all units (m x 1) 

yobs € Rm 

X(«i) 

Performance of agent i playing action ai 

x( a i) € R 

x( A ) 

Vector of performances (n x 1) 

(x{Ai),... ,x(A n )) J 

A* 

Natural action of agent i - maximizes performance 

A* £ Ai 

X* 

Quality of agent - performance at natural action 

X*€R 

T 

Agent of highest quality 

T £ X 

^(T obs ) 

Score of agent i 

0i(Y obs ) € R 

0(T obs ) 

Vector of agent scores (nxl) 

( 0 i(v obs ),...,</) n (y obs )) 

f(Y ohs ) 
Pi( A\V) 

Estimated agent of highest quality - agent with maximum score 
Probability agent i wins under design D, given fixed actions A 

f (yobs) € x 


The profit defined in Eq. (J7J) can be negative because we assume implicit advertisement costs. 
Furthermore Eq. fl7]) implies no interference between units, and no strategic interference 
between agent actions. We will make this precise in Section [3J 

The experimenter is interested only in expected profit, ignoring the risk. Thus, the 
performance of action cq = (/q, of) of agent i is 

X(«») =E(y u (Z, A)|A = aA,Z u = i) = /^. (8) 


Hence, the quality y* of agent i is the maximum /q the agent can achieve over its action 
space Ai- Now, consider an experiment design T> = ("0,0), where the score function 0 is 
defined as 0j(V obs ) = E° bs , i.e., the score of agent i is the sample mean profit from all units 
assigned to agent i. Ignoring ties, the winning agent is given using Eq. (|4l): 


f (y-° b s) 


1, if Y° hs > Y° hs , 

2, if Yf° < V° bs . 


(9) 
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By Eq. (J7J), F° bs ~ W(/q, of/£;), where k is the number of units per agent. Hence, the 
probability that agent 1 wins is 

Pi(A|£>) = f Pr (f(F obs ) = 1| A, V) = P(F°* > Yf~ s ) = ^(Vk —=^=), (10) 

V°i + °2 

where <L is the normal cumulative distribution function (CDF). This design is not incentive- 
compatible because the winning probability Pi(A[D) is not monotone with performance 
x(oii) = p-i for action aq = (/Xi,of). For example, an increase in /ii may be associated with 
an increase in the risk of, such that the probability of winning is reduced. 

To see this, assume there are only two actions for agent 1, which induce mean and vari¬ 
ance A\ = {(1.5,100), (2,20)}, and only one action for agent 2, A 2 = {(9,1)}. The quality 

of agent 1 is \i = max{/i : (/x, a 2 ) G A{\ = 2 and thus (2,20) is agent l’s natural action. 
However, when agent 1 plays the natural action, its winning probability is approximately 
equal to 0.12, whereas action (1.5,100) yields winnining probability 0.364, approximately. 
When agent 1 does not play the natural action, the expected value of its outcomes are re¬ 
duced but their variance is increased, thus overall increasing agent l’s chances to win the 
experiment. Therefore, this experiment is not incentive compatible since agent 1 prefers not 
to play the natural action. 


Example 2(b). — Normal outcomes — High risk/reward. Continuing Example 2(a), 
let’s suppose that the variance of the unit’s outcome satisfies of = fij, indicating a delicate 
trade-off between expected return and risk. The probability that agent 1 wins is easily 
obtained from (TTUl) as, 


Pi(A|D) = P(Y° hs 


> F° bs ) = §(y/k 


vVf + /4 


(ii) 


The experiment design is still not incentive-compatible because (TTT1) is not increasing mono- 
tonically with fi As before, the better agent will choose to be more conservative, and will 
not reveal its quality (maximum possible /if. However, we will show in Section [3] that an 
incentive-compatible design can be achieved through the score function (pi{Y ohs ) = — 1/Yf bs , 
i.e., the negative reciprocal of the sample mean profit. We will show that, with this score 
function, the risk-reward trade-off in (TITj) disappears, which allows the experimenter to es¬ 
timate agents’ qualities. 


Example 3(a) — Poisson outcomes. Now suppose the outcomes are integer-valued, 
e.g., representing the number of purchases. In this case, we assume that an agent’s action 
cq = (Af G M + determines the purchase rate by unit u, such that, given assignment Z, 
actions A, 


Y u { Z, A) ~ Pois(Aj), if A; = cq, Z u = i. (12) 

As in Eq. (JTJ) of Example 2(a), Eq. (fT2l) implies no interference. Let’s suppose the 
experimenter is interested in performance that is the expected purchase rate. Thus, using 
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Eq. (IT]) . the experimenter measures performance of action cq = (Aj) of agent i, through 

X(cq) = E (Y U (Z, A) | A = cql, Z u = i) = (13) 

Hence, the quality y* of agent i is the maximum purchase rate A; that the agent can achieve 
over its action space Ai- Now, consider the experiment design T> = (-0,0), where the score 
function 0 is defined as 0j(H obs ) = V° bs , i.e., the score of agent i is the sample mean purchase 
rate from all units assigned to agent i. Ignoring ties, the winning agent f(l' obs ) is given using 

Eq. ([9]). By the central limit theorem, 7 t obs —¥ A i/k), where denotes convergence 

in distribution, and k is the number of units per agent. The probability that agent 1 wins 
is, asymptotically, 

Pi (A I'D) = P(yp > yip) = $(Vk Xl = X L ). (14) 

v Ai + A 2 

This design is incentive-compatible because the winning probability P 1 (A|D ) ) is monotone 
with the agent performance; for example, an increase in Ai incurs a larger increase in the 
nominator of Eq. (ITT]) than in the denominator. By symmetry, the winning probability for 
agent i is maximized at its natural action. 

In Section 14.11 we will show that a more powerful design is possible, i.e., there exists 
an experiment design V that is incentive-compatible and also guarantees higher winning 
chances to the better agent. 

The examples highlight the challenges in incentive-compatible experimental design that 
arise because the experimenter is interested in some quality of an agent (e.g., expected 
return) but cannot find a design that incentivizes agents to play in a way that reveals their 
qualities. The problem that can arise is because of a mismatch between the score function 
0 that is used to declare the winner, and its effect in inducing a non-cooperative game, and 
the performance function y that is of interest to the experimenter. 

Compared with classical mechanism design theory , incentive-compatible experimental 
design differs in that: 

• In mechanism design, the private information is an agent’s preferences, whereas here 
the private information is an agent’s quality (i.e., the performance of its natural action). 

• In mechanism design, there may be side payments that can be made, whereas here the 
incentives are winner-take-all and depend on the outcome of the experiment. 

• In mechanism design, it is standard to appeal to the revelation principle and design 
a direct-revelation mechanism, in which agents report their preference type to the 
mechanism. In comparison, the agents in our setting select an action and the designer 
observes the effect of this action, but not the action itself. 


3 Theory of incentive-compatible experimental design 

In this section we prove our main result, which provides a construction of score functions to 
design incentive-compatible experiments. The proof relies on the existence of statistics that 
can estimate the individual agent performances y(Aj), as the number of units grows large. 
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Definition 3.1 (Identifiable performance, identifying statistic). An experiment design T> = 
(0, 0) has identifiable performance x> tff or every fixed action profile A, there exists a statistic 
T : R m —» K n calculated over data Y ohs , such that 

Vk ( T(Y° h ‘) - x(A)) 4 M{ 0, E(A)), (15) 

as the number of units per agent k grows large; Af is the n-variate standard normal, and 
£(A) is the n x n covariance matrix of T that can depend on A. The statistic T is an 
identifying statistic for experiment design T>. 

An identifying statistic is important because it estimates the individual performances 
x(Ai), which are the quantities of interest to the experimenter. Although finding such a 
statistic is not an easy task, one simple strategy is to use sample quantities, such as averages, 
and then appeal to the central limit theorem, or other large-sample asymptotic results. We 
use this strategy extensively in this paper. 

However, an identifying statistic T calculated over data T obs need not be sufficient for 
incentive alignment in our winner-take-all experiments. Thus, we consider score functions 
defined as 0j(T obs ) = f{Tf), for an appropriate transformation / : M —>■ M. The transforma¬ 
tion is used to add flexibility in the design of the score function. Agents will be evaluated 
according to the score vector fi>(Y ohs ). The covariance matrix of the score vector fi>(Y ohs ) is, 
asymptotically, equal to 

v,(A) = J*£(A)J-;, ( 16 ) 

where is the Jacobian of 0 calculated at %(A), actually a diagonal matrix with elements 
/'(x(Aj)). Whether an experiment design (0,0) is incentive-compatible or not, depends 
crucially on the matrix V/(A) because this matrix defines the variances of the scores used 
to evaluate the agents. 

Theorem 3.1. Fix agent actions A, and consider design T> = (0, 0) that has an identifying 
statistic T with covariance matrix £(A). Define the score function as 0j(T obs ) = /(T)) ; for 
some function f : M —> M, and let Vij(A) be the ijth element ofV(A) defined in Eq. (fltrlh 
Also define, 

v l f(a\A_i) = va(a , A_f) + Vjfiot, A_f) - Vifia, A_f) - Vjfia, A_0. (17) 

Design V is incentive-compatible, if, for every agent i, 

) f{x(. a i)) 1 r / n def .* /I o\ 

arg a a ?. 06( ai |A_.)V0 = ” 8 S { * ( “‘» = ' 4i ' (18) 

for every agent j 0 i, and all actions A _j. 

For a fixed action profile A, the element v l j in Eq. (fTSlh is the variance of the difference 
between the scores of agents i and j, 0j(T obs ) — 0j(T obs ), as defined in Theorem 13.11 Thus, 
Eq. (1T8|) is the probability that agent i has a larger score than agent j, and implies that this 
probability is maximized at the natural action. 

Theorem 13.11 suggests a recipe to construct incentive-compatible experiments, as we il¬ 
lustrate through examples in the following sections. 
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• First, one needs to find an identifying statistic to estimate the performances of agents, 
i.e., their outcomes without competition. A parametric model for the unit outcomes 
together with known asymptotic results, such as the central limit theorem, or the 
asymptotic normality of the maximum-likelihood estimator, can provide such an iden¬ 
tifying statistic with known covariance matrix E(A); see also Appendix|D]for a relevant 
discussion. 

• Second, given the identifying statistic, one then needs to find an appropriate transfor¬ 
mation / to satisfy Eq. (USD . This transformation can be as simple as the identity 
function, as in Example 3(g), or the reciprocal function, as in Example 2(c). Intu¬ 
itively, the design goal for / is to make the denominator of (fT8]l less sensitive to agent 
actions than the nominator. 

Theorem 13.11 makes no assumption about interference. In the following sections, we will 
specialize and apply Theorem 13.11 on the viral marketing example, both with and without 
interference. 


4 Incentive-compatible experiments without interfer¬ 
ence 


The setting without interference is formally defined through the following assumption. 

Assumption 4.1 (No interference). There is no strategic interference among agents and no 
interference between units, i.e., for all assignments Z and all agent actions A, 

Y u ( Z, A) = Y u (Ai), where Z u = i. (19) 


Assumption 14.11 postulates that the potential outcome Y u ( Z, A) of a unit u assigned to 
agent i, remains constant as long as agent i’s action and unit u ) s assignment to agent i are 
held fixed. Under no interference, the distribution of a score function defined through an 
identifying statistic is a univariate normal, as shown in the following proposition. 


Proposition 4.1. Consider design T> = (?/>, <f>) with an identifying statistic T with covariance 
matrix E(A). Let </>j(U obs ) = f{Tf), for some function f : R —y M, and suppose Assumption 


4-l\ holds. Then, for fixed actions A, 


Vk - f(x(Ai)) A V (0 


( 20 ) 


where a 2 (Ai) = /'(x(Aj)) 2 cq 2 ; with cq 2 being the ith diagonal element of E(A). 

Proof. By Assumption 14.11 (no interference), the covariance matrix E(A) of T is diagonal 
with elements a\. Thus, by definition of the identifying statistic, 


Vk(T,-x(A,))^M(0,4). 


Since, </>;(V obs ) = f(Tf), Eq. (l20l) follows from a simple application of the Delta theorem; 
see, for example, Bickel and Docksum [3, Chapter 5], or Cox [5]. □ 
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Proposition 14. II provides the asymptotic distribution of the score function, given an iden¬ 
tifying statistic and a known transformation /, when there is no interference. This will be 
useful to derive the winning probabilities for agents in the experiment. We first illustrate 
Proposition 14.11 and then show how it can be used to simplify the conditions of the more 
general Theorem 13.11 

Example 2(c). We continue from Example 2(b), where agent i’s action is Aj = (pf), 
and U° bs ~ J\f(pi,pf/k), where k is the number of units per agent. The statistic T(U obs ) = 

(U° bs , Ef s ,..., E° bs ) T = T, is an identifying statistic, since x(A) = (pi, p 2 ,..., p n ) J = f p, 
and 

Vk(T-p) AW(0,E), (21) 

where £ = diag(/i{,..., pf) , is the diagonal matrix with elements pj. 

Consider the score functions f>i(Y ohs ) = 1 /T- L = 1/U° bs , i.e., f(x) = 1/x, in the notation 
of Proposition 14.11 Using the result in Proposition 14.11 a 2 (Aj) = f'(pi) 2 pf = 1, and thus 

Vk(MY obs ) - 1/p,) 4 W(0,1). (22) 


The variance of the score function in Eq. (j22l) is stabilized. The following theorem shows 
that such variance stabilization can lead to incentive-compatible designs, when there is no 
interference. 

Theorem 4.1. Consider design T> = (^,0) with an identifying statistic T with covariance 
matrix £(A). Suppose Assumption \f.l\ holds. If, for every agent i, 

<MF obs ) = /(Tj), where / : R —> M, 

¥ar(0j(y' obs )) = const., 

arg max /(y(«i)) = arg max{x(a0)} = A*, 

OLidJki 

then design V is incentive-compatible. 

Condition (T24|) is related to variance-stabilizing transformations in statistics, which also 
play an important role in hypothesis testing; we discuss this relationship in Appendix [Cl 

Example 2(d). — Normal outcomes — High risk/reward. Continuing from Example 
2(c), we consider the high risk-reward setting of the viral marketing problem, where an 
agent’s action is to pick an expected return, i.e., A* = (/i;), and the winning probability is 
given by 


(23) 

(24) 

(25) 


P,{ A|X>) = $(Vk 


Pi ~ 1-^2 \ 

\TiA + P2 


(26) 


The performance function is x(cp) = Pi, and thus the natural action is A * = argmax Q ,. ev 4 .{o(j}. 
It was shown that design T> in Example 2(b) -using the sample mean as the score function- 
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is not incentive-compatible. Consider instead a design V with score function f>i(Y ohs ) 
—1/Y'° bs - Using the result of Example 2(c), 


Vk - (-1 /ft)) A JV(0,1). 


(27) 


Condition (125)) is satisfied by definition of </>;. Condition (01)1 is also satisfied, because 
the variance of 0;(Y' obs ) in Eq. (07)) is constant. Furthermore, 


arg max ifixiaf)) i = arg max { —1/cqj = arg max{cq} = A, 


:k 
i 5 



which satisfies Condition f|25]l . Thus, all conditions of Theorem (14.ip are fulhllcd. It follows 
that the new design V is incentive-compatible. 

By construction of the probabilistic model in Example 2(b), there is a very delicate trade¬ 
off between expected return (agent performance) and risk; for example, if an agent doubles 
its performance, then the risk will quadruple. In such situations, it is a bad idea to adopt the 
sample mean as the score statistic. Intuitively, Eq. (01)1 shows that the higher-quality agent 
will try more conservative actions, thus hiding its true quality. However, if agents are scored 
according to the negated reciprocal of their sample mean, the probability that an agent wins 
increases monotonically with an agent’s performance. Thus, agents have the incentive to 
select actions that maximize their performance, and thus it is a dominant strategy to select 
their natural action. 

4.1 Powerful incentive-compatible experiment designs 

Given the choice of two incentive-compatible designs, it is natural to prefer the design in 
which the highest-quality agent has the highest probability of winning. We formalize this 
intuition through the following definition. 

Definition 4.1 (Powerful incentive-compatible design). Consider two experiment designs 
V and V that are both incentive-compatible and operate on the same set of units U. Let r 
be the agent of highest quality. Design V is (weakly) more powerful than design V if the 
probability that agent r wins in the dominant strategy equilibrium is higher in T>' than T>; 


P T (A*\V) > P T {A*\V), 


(28) 


where A* is the natural action profile, which is the same in both designs. 

In the following theorem, we give a simple case where we can transform an incentive- 
compatible design into a more powerful one. 

Theorem 4.2. Consider an incentive-compatible design V = (fj, <f>), where action sets Ai C 
M are compact, and performance x is one-to-one and continuous. Let, 



( 29 ) 
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where function a 2 : A —$■ M + satisfies 

x(«D > x(«») =► ^ 2 («D > o- 2 («i), (30) 

for every agent i, and all actions a[, a* C ^4, 0 

Consider a design V = (if, ft), where 0'(F obs ) = iy(4> l (Yf hs )), for each agent i, with u(-) 
defined by 

V(v)= fv*km dz - (31) 

Then, design V is incentive-compatible and more powerful than T>, if u(-) is convex, or 
1/ vMwHO) an d (j2 (x _1 (')) are b°th convex. 

The variance of the new score function, Var(0'(E obs )), is constant, because function v 
defined in Eq. (123j) is a variance-stabilizing transformation [5]. This fulhlls Condition (124)) 
of Theorem 14.11 while the monotonicity (130|) of cr(-) maintains the monotonicity Condition 
([25]) . The new design V is thus incentive-compatible. 

Example 3(b) — Poisson outcomes. Continuing from Example 3(a), the actions are 
Ai = (Aj) e M + with performance x(Af) = Aj, while the score statistic is <fi{Y ohs ) = E° bs ; 

thus, Vk (0j(F obs ) — A*) —> J\f( 0, A*). Let agent 1 be the best agent. Consider a new design 
V with the transformation 

v{v) = F V°(x-'(z)) dz = S' 7i iz = 2 ^ 

and score function (j)' % (Y ohs ) = i/(</>j(W bs )) = 2 \JY ° hs . Design V is incentive-compatible and 

more powerful than design T> of Example 3(a) by Theorem 14.21 since 1/ ftcr 2 (x -1 ( 2 )) = ^-/ftz 
and cr 2 (x~ 1 (z)) = z, are both convex. Another way to see this is through Proposition 14.11 
which implies ftk (fti(Y ohs ) — ‘iftXf) —¥ J\f( 0,1). Thus, the probability that agent 1 wins is 

Pi(A\V) = ^ft^ift^ - y/%)). (32) 

We can verify P 1 (A|X> / ) > Pi(A\V) by comparing Eq. (1321) with Eq. (1T4D : 

$ vT)) > t _ yy, > 

' ' \ v Ai + A2 / V Ai + A2 

The last inequality always holds because it reduces to (i/Ai — ft\f) 2 > 0. 

In Example 3(b), the better agent (agent 1) has higher chances of winning in the new 
design T >'. Since T>' is also incentive-compatible, it follows that T>' is more powerful than V. 
Intuitively, the square root transformation in the new design stabilizes the variance - there 
is no denominator in Eq. (132)) - which achieves incentive-compatibility through Theorem 

m _ 

7 Condition (1301) posits that an agent cannot increase its expected score without increasing the variance 
of the score. This is a reasonable assumption in practice because actions that do increase the expected score 
without increasing the variance, are strongly preferred. 
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4.2 Using transformations for more powerful designs 


If there is only one block and transformation is(-) in Theorem [472] is order-preserving, then the 
transformation might not affect the power of the experiment design. For a simple argument, 
let X,Y be two positive random variables, then P(X > Y ) = P(is(X) > is(Y)) if v is 
order-preserving 1^ 

However, when there are multiple blocks, a transformation can improve the power of the 
design even when the transformation is order-preserving. In the following simulation study, 
we expand the design introduced in Example 3(a) to multiple blocks in order to illustrate 
the positive effect of the square-root transformation, which is variance-stabilizing for Poisson 
outcomes, on the power of the design. In this simulation study we focus on power because 
the design is already incentive-compatible, as shown in Example 3(a) @ 

Consider a design V with two agents and two blocks. Agent i plays action X ib in block 
6; we set An = 5, An = 10 for agent 1, and A 21 = 4.25, A 22 = 9.95, and thus agent 1 is the 
high-quality agent. We repeat the following process 10, 000 times. First we fix the number 
of units per block, say k. Second, we sample Y uib ~ Pois(Aj&) i.i.d. for every unit u in block 
6, where Y uib indicates the total number of sales for unit u assigned to agent i in block b. 
We then use the sample mean as the default score function, but also apply a transformation 
is. In particular, the total score of agent i is Ylb= 1 UUfc), where Y lb = ( Y uib ) is the vector of 
unit outcomes for agent i in block b, and is is the transformation. The winner is the agent 
with highest score. After all 10, 000 repetitions we report the %wins by agent 1. 

The results are shown in Table [2] where we compare the identity transformation against 
the square-root transformation for multiple number of units per block. We observe that the 
square-root transformation, which is also the variance stabilizing transformation according to 
Theorem 14.21 increases the winning chances of agent 1 (high-quality agent). As the number 
of units per block increases the sample means get closer to the actions played by the agents 
(i.e., values A*) and thus agent 1 wins almost with probability one at both designs. 

For an intuition why variance stabilizing works with multiple blocks, consider the argu¬ 
ment at the beginning of this section. In particular, let X\ = f Y,n,X 2 = Y, 12 be the sample 
means of agent 1 in blocks 1 and 2, respectively, and let Y\ = f Y 2 1 , Y 2 == Y 22 , be the respec¬ 
tive sample means for agent 2. If there was no transformation the winning probability for 
agent 1 would be P(X\ +X 2 > Y± + Y 2 ). With the square-root transformation this probabil¬ 
ity is P(VXI + \fX~ 2 > \/Y\ + \fYi)i which is generally larger than the probability without 
transformation. Intuitively, the square-root transformation accentuates the differences in 
the mean-rates of the two agents (i.e., the actions A i b ) and downplays the differences in the 
tails. The formal proof is a simple extension of Theorem 14.21 which uses convexity/concavity 
arguments. 


8 A similar observation can be made in regard to the use of score functions cj)i to achieve incentive com¬ 
patibility: order-preserving transformations </>i do not affect incentives. Note, for example, that the negated 
reciprocal transformation that aligns incentives in Example 2(d) is not order-preserving (e.g., 2 > — 1 but 
— 1/2 < — 1/(— 1))- The outcomes in that example could take negative values; if outcomes were constrained 
to be positive, incentives would not be affected. 

9 The introduction of multiple blocks does not affect the incentives because incentive compatibility was 
defined with respect dominant-strategy equilibrium and outcomes are sampled independently across blocks. 
Multiple blocks could affect incentives if agents were able to benefit from making strategic trade-offs between 
blocks, e.g., be conservative in one block and be risky in another. 
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Table 2: Probability agent 1 wins in a design with two blocks and two possible score trans¬ 
formations. Probabilities were calculated over 10,000 repetitions. 


Transformation v 

^units/block u(x) = x u(x) = y/x 

5 

0.62 

0.65 

10 

0.67 

0.72 

25 

0.77 

0.82 

50 

0.85 

0.91 

100 

0.93 

0.97 

500 

1.00 

1.00 

1000 

1.00 

1.00 


5 Incentive-compatible experiments with interference 

We now consider strategic interference, whereby an action of an agent can affect the outcomes 
of units assigned to another agent. Therefore, agent scores calculated on individual agent 
outcomes are confounded with the entire action profile. 

Example 3(c) — Poisson outcomes with interference. Building upon Example 3(b), 
we now introduce a more realistic model of the viral marketing experiment, which we assume 
operates as follows. 

As before, units are assigned to agent 1 or agent 2 . We refer to the units assigned to 
agent i. i.e., the set {u G U : Z u = i}, as the test set of agent i. In addition, each agent is 
free to pick a seed set ; each seed set is in a separate population that is disjoint from the test 
sets. The seed set i corresponds to treatment version -agent action- A*. The seed set will be 
targeted with a promotional campaign, and outcomes will be measured on units only in the 
test sets, say, number of purchases for each unit. The rationale is that the experimenter is 
interested in the viral marketing efficacy of the agents, i.e., their ability to select influential 
seed sets. 

Under interference, the treatment version (seed set) selected by agent i induces a rate 
A i on units assigned to i, and a rate 'y\ i , where 0 < 7 < 1 , on units assigned the other 
agent. The parameter 7 models the amount of interference; if 7 = 0 there is no interference, 
whereas 7 = 1 indicates maximum interference. For the rest of this paper we will consider 
7 known to agents and the designer, but this is without loss of generality. Rate Aj can be 
interpreted as the rate that agent i would achieve if the units that are targeted were its own 
units. Parameter 7 represents a discount because the targeted units are in the test set of 
another agent. 

The setting with interference is depicted in Figure [0 The labels on the edges correspond 
to the effects from the seed sets, including interference effects. For example, the purchase 
rate in test set 2 (units assigned to agent 2 ) is equal to 7 A X + A 2 ; the first term is the dis¬ 
counted influence from the seed set of agent 1 , and the second term is the influence from the 
seed set of agent 2 . Agents are scored based on outcomes of units in their respective test 
sets. Therefore, an agent can also “free-ride” on the conversion rate that comes from the 
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action of the other agent. 



Figure 1: Test set i has units assigned to agent i, i.e., {u G U : Z u = i}. Seed set i 
corresponds to the treatment version A^. The seed sets influence the purchase rate of units 
in the test sets, for example, through word-of-mouth effects between units. In particular, 
Ai = (Aj, AJ, where X t is the induced rate from seed set i to test set i, and 7A i is the induced 
rate from seed set i to the other test set, where 0 < 7 < 1 is a parameter that models 
interference. Outcomes, i.e., product purchases, are measured on units in the test sets; the 
score of agent i will be calculated based on observed purchases in test set i. Arrows indicate 
induced purchase rates from the seed sets; dashed arrows indicate that the rate is discounted 
by 7. The presence of interference, where an agent can affect the purchase rate on a test set 
of another agent, changes how agent select their seed sets, i.e., their treatment versions. 


Example 3(d) — Poisson outcomes with interference. Given the interference model 
of Example 3(c), the actions are A\ = (Ai, AJ, A 2 = (A2, A 2 ), and the observed outcomes on 
the units in the test sets have the following distributions: 

kT ~ p°isOi+ ta;), 

~ Pois(A 2 + tA',). (33) 

To derive the performance of an agent, say agent 1, we need to replace agent 2 with a 
replicate of agent 1, playing action A 2 = (Ai, AJ. In this case, the induced rate on the units 
assigned to agent 1 is actually equal to Ai + A : since, by definition of our interference model 
in Example 3(c), a rate is discounted only from a seed set of one agent to the test set of 
another agent. Thus, the performance of agent i for action cc, = (A*, AJ is equal to 

x(«J =E(E U (Z, A)| A = cXjl, Z u = 1 ) = A* + A-. (34) 


It can be seen, by inspection of Eq. (j35|) . that the outcomes of one unit depend on 
the action of the other agent. For example, the outcomes E° bs on units assigned to agent 
1 depend on action Ai of agent 1 as well as action A 2 of agent 2. Hence, the observed 
outcomes for one agent carries statistical information for the action of the other agent. This 
information should be used in order to correctly estimate the agent qualities, and then the 
agent of highest quality. 
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However, the estimation of qualities is not possible through outcomes fl33l) . because there 
exist multiple action profiles for which the observed outcomes are equally likely. It follows 
that there is no identifying statistic, and our theory (e.g., Theorem 13.11) cannot be applied. 
Furthermore, the variance-stabilization transformations that were shown to give more pow¬ 
erful designs in Example 3(b) do not work. This is illustrated in the following example. 

Example 3(e). — Poisson outcomes with interference. Consider the setup of Example 
3(c) and an experiment T> with the usual score function 0j(E obs ) = U° bs . As the number of 
experimental units grows, Eq. (133|) result in the following asymptotics. 

Vk (y )° bs — (Ai + qA 2 ) j —» A f (0, Ai + 7A2), 

Vk ^U 2 bfa — (A2 + 7A1) j — > A/”(0, A2 + 7A1). 

Therefore, the probability that agent 1 wins is 


Pi( A\V) = Pr(Ff s > Yf s ) 


\ a/Ai + yA x + A 2 + yA 2 J 


(35) 


This design is not incentive-compatible because agent 1 prefers a large Ai — 7 A) and a 
small Ai + 7 A x . As can been seen from Figure [1], a purchase rate of yA x from the seed set 
of agent 1 only benefits agent 2. Thus, agent 1 wants to benefit its assigned units (test 
set 1) while minimizing the spillovers to test set 2 that benefit only agent 2. However, the 
experimenter wants to know something very different. In particular, given the definition of 
performance in Example 3(d), the experimenter wants to know the maximum Ai + A : that 
agent 1 can achieve (and maximum A 2 + A 2 , for agent 2). This quantity is of interest because 
it is the quantity that agent 1 would maximize if a copy of agent 1 substituted agent 2, and 
also played (Ai, AJ. 

Using the variance-stabilizing transformation of Example 3(b), does not solve the prob¬ 
lem. In particular, if we use 0j(U obs ) = 2 \J U ° bs as the score function, then the winning 
probability of agent 1 becomes 


Pi(A|D) = $ (Xi + 7A; - ^A 2 + 7 A'i) 


The incentive problem remains because agent 1 still wants achieve a high purchase rate 
Ai on units in test set 1, and a low rate A x in units of test set 2. 


5.1 Dealing with strategic interference through better designs 

We now describe a method to construct an incentive-compatible design in the viral marketing 
problem with interference. The idea is to introduce a new design that will provide an 
identifying statistic, and then define appropriate score functions to fulfill the conditions of 
Theorem 13.11 that guarantee incentive-compatibility. 
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Example 3(f). — Poisson outcomes with interference — New design. We consider 
the following new design. The units are split in two groups, say G\ and G2. Within each 
group, units are randomly assigned to the two agents, resulting in 2 test sets per agent. For 
example, group G\ has two test sets, namely Gn with units assigned to agent 1, and G \2 
with units assigned to agent 2. Similarly, group G 2 has test sets G 21 with units assigned 
to agent 1, and G 2 2 with units assigned to agent 2. Test sets in the same group may be 
overlapping. In addition, each agent is free to pick one seed set ; each seed set is in a separate 
population that is disjoint from the test sets. The seed set i corresponds to treatment version 
-agent action- A,. The outcomes Y, say number of purchases for each unit, for each agent 
z, will be measured on units only in their two test sets, namely Gu and G 2 i- This design is 
depicted in Figure [2j 



Figure 2: Test sets G\j and G 2 j have the units assigned to agent j, i.e., {u G U : Z u — i }; 
there are two test sets per agent. Agent i selects an influential seed set z, that corresponds 
to the treatment version A,. The seed sets influence the purchase rate of units in the test 
sets. I11 particular, A* = (Aj,Aj), where \ is the induced rate from seed set i to a test 
set with units assigned to z, and yAj is the induced rate from seed set z to a test set with 
units assigned to the other agent. Outcomes are measured on units in the test sets; the 
score of agent z will be calculated based on observed purchases of units assigned to agent 
z; for example, agent 1 will be scored based on outcomes of units in Gu and G21. Arrows 
indicate induced purchase rates from the seed sets; dashed arrows indicate that the rate is 
discounted by 7. Agent scores are calculated based on outcomes in their respective test sets. 
The presence of interference, where an agent can affect the purchase rate on a test set of 
another agent, changes how agent select their seed sets, i.e., their treatment versions. 

The outcomes model is similar to the design of Example 3(c) (see also Figured]). A seed 
set z -action A*- induces a rate A * on units of group G*, and a rate X t on units of the other 
group. The rate is assumed to be discounted when the seed set is targeting units in a test 
set of another agent. For example, units in test set G i2 will have purchase rate A 2 + 7A1; the 
rate A 2 originates from seed set 2 affecting units in group G1, and rate Ai is from seed set 1 
affecting units in G1, discounted by 7 because G12 is a test set of agent 2. Thus, action A* 
is associated with a pair of rates, A* = (Aj, A, : ). 

Agent l’s action is Ai = (Ai, A x ), and agent 2’s action is A 2 = (A2 ,A 2 ). Therefore, the 
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observed outcomes of units are distributed as follows: 


"Pois(Ai + 7A2), if u G Gn, 
Pois(A 2 + 7A1), if u £ G 12, 
Pois(A' 1 + 7A2), if u G G21, 
^Pois(A 2 + 7A1), if u G G*22- 


( 36 ) 


Using the same interference model (parameter 7 of discounted influence) introduced in 
Example 3(c), the new design of Figure [2] now provides more information about the agent 
actions, and thus their performance, through outcomes ()36lh This additional information 
provides an identifying statistic that can be used to define score functions that make the 
design of Figure [2] incentive-compatible. 

Example 3(g). — Poisson outcomes. By symmetry of the new design, the experimenter 
is interested to estimate y(A:) = A* + Aj. Let Y l3 be the sample mean of outcomes of units 
in test set Gij, and let Y = (Pn, Fi 2 , U 2 i, F22) 1 ". Define the matrices 




f 1 

0 

7 

0 \ 

/ 1 1 0 0 \ 

, and C = 

7 

0 

1 

0 

B — 1 


O 

O 


0 

1 

0 

7 


\ 0 7 0 1 / 


Denote the action profile as A = (Ai, A), A 2 , A 2 ) T . Further, let D A = diag((7A) be the 
diagonal matrix with diagonal elements from the vector C A. By Eq. (I36jh as the number 
of units grows, we have 

Y - CA) 4 A7(0, D a ). (37) 

The term m/4 is because there are m/4 units per test set. Now define the statistic T = 
BC~ X Y . Since %(A) = (Ai + A), A2 + A 2 ) T = BA, it holds, asymptotically^ 

^74(T - x(A)) A A/"(0, DC'- 1 D A (C- 1 ) T DT). (38) 

Therefore, the new design has identifiable performance, and T is an identifying statistic, 
with covariance matrix S(A) = BC~ l DB 1 . 

Now, using notation of Theorem 13.11 define the score function simply as 

HY° hs ) = f(Ti) = T % . (39) 

Thus, the Jacobian of 0 is L7^> = I, the identity matrix. The matrix U(A) of Theorem 13. II is 
calculated as 


V (A) = ^E(A) Jl = BC~ l D A {C~ l y B 1 . (40) 


10 The normality of T follows from normality of Y. The expected value of T is E(T) = E (BC 1 Y) = 
E(i?C' _1 CA) = BA, and its variance is Var(T) = Var(i3C' _1 Y) = BC' _1 Var(Y)(C' _1 ) T i3 T = 
BC , - 1 (D A /m)(C , - 1 )TBT. 
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Through simple but tedious matrix algebra we obtain, 


V{A) = 


d\ + 7*^2 + d 3 + 7*^4 


-7ELr< 


(1-7 2 ) 5 


“7 E?=1 l 2 di + d 2 + 7 2 d 3 + d A I ’ 


(41) 


where (dj) are the diagonal elements of Ha; thus, di = A1+7A2, 0(2 = 7A1 + A2, d 3 = A)+7A2, 
and c/4 = 7 A x + A 2. In particular, 


di — (1 + 7) (Ai + A : ) + (A2 + A 2 ) • 

i=l 

It follows from Theorem Eq. (TT7T) of Theorem 13.11 

4 

vj (q;| A_j) =(di + 7 2 d 2 + d 3 + 7 2 ( ^ 4 ) + ( 7^1 + ^2 + 7 2( ^3 T d±) — ( — 27 y ^ di) 

4 

—(1 + 7 ) 2 ^ ^ — (1 + 7 ) 3 (Ai + AJ + (A 2 + A 2 ) 

i—1 

if i 7^ 7, and 0 otherwise. It follows that, 

fix (<*>)) 


(42) 


2=1 


A< + A,- 


arg max 


oc arg max 


a-i&Ai 1 v l f (o(j| A_j) 1 / 2 J OiGA ) a/(Ai + A' x ) + (A 2 + A2) 


(43) 


The expression on the right of Eq. (1431) is increasing with respect to x( a i) — A* + A). 
Therefore, each agent prefers to play actions (A*, AJ so as to maximize their sum, A, + A 
which is the quantity of interest to the experimenter. Condition (fT8l) of Theorem 13.11 is 
fulfilled. Thus, incentives are aligned under the new design. Intuitively, the new design 
allows all agents to benefit from spillovers. For example, in the previous design, agent 1 
could not benefit from the spillover of seed set 1 to test set 2, because agent l’s score was 
calculated only on test set 1. However, in the new design, the score of agent 1 includes 
outcomes from units in the test set G21, which receives spillovers from seed set 1. 


6 Conclusion 

We introduced game theory into experiments where the treatments are determined by actions 
of strategic agents, and where treatments can interfere with each other. The goal of the 
experiment is to estimate the agent that is best with respect to a quantity of interest, 
defined in a context without competition; e.g., average number of conversions from the 
agent’s algorithm for viral marketing. However, statistical estimation of the best agent is 
based on experiment data, generated with competition among agents. Thus, the game- 
theoretic setting poses new challenges to the statistical analysis of experiment data, and 
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may often invalidate well-established experimental design methods. The goal of incentive- 
compatible experimental design is to promote behaviors by agents that accord to the natural 
actions the agents would take in the experiment if there was no competition. 

When agent actions do not interfere with each other, we showed that incentive-compatible 
designs are possible through variance-stabilizing transformations of statistics that estimate 
how agent would perform without competition. Furthermore, we proved a result suggesting 
that variance stabilization might, more generally, lead to more powerful incentive-compatible 
experiment designs, in which better agents have higher chances of winning. In the presence 
of interference, we showed that more elaborate designs are generally necessary to obtain 
statistics that estimate agent performances. In the context of a viral marketing application, 
we showed how a better design can be constructed that can account for interference among 
agents, e.g., when agents are able to free-ride on the advertising campaign of other agents. 
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Appendix 


A Extension to multiple blocks 


In this paper, our theory is developed and applied assuming only one block. However, it is straight¬ 
forward to extend it to multiple blocks in a typical blocking experiment design. In this section, we 
give an outline of this extension. 

The treatment assignment rule if now groups units into B blocks based on their covariates, 
and then randomizes treatment (i.e., the assignment of units to agents) within the blocks; blocking 
is performed in a deterministic way based on the publicly known covariates {X u }, for each unit 

def 

u. Formally, rule if is a probability distribution over the space of pairs of binary matrices T = 
({0,l} mxB ,{0,l} mxn ). 

A pair (IT, Z) £ T is called a treatment assignment, and has the following interpretation. The 
element W u f, = 1 if unit u is assigned to block b, and it is 0 otherwise. Similarly, Z u i = 1 if unit 
u is assigned to agent i, and it is 0 otherwise. Using dot-notation W_b is the 6th column of matrix 
IT, W u , is the itth row of IT as a B x 1 vector, and IT.. = IT. Similarly for Z and other matrices. 
Finally the notation (IT, Z ) ~ if will denote a treatment assignment (IT, Z) £ T, that is sampled 
according to rule if. 


Example Al. Consider four experimental units (consumers) and two treatments (marketing 
agents) that an experimenter wishes to evaluate. In particular, the experimenter is interested 
to estimate which agent can achieve the highest number of sales. Suppose that, for each unit 
u, the experimenter and the agents know the marriage status (only covariate). We assume that 
units {1,2} are not married and {3,4} are, and these correspond to the two blocks b £ {1,2}. 
The experimenter suspects that the outcomes differ systematically based on marriage status, and 
randomizes treatment within blocks. This design corresponds to treatment assignment rule if 
which samples with equal probability 1/4 from the treatment assignments {IT, Z} where Z £ 

0 \ 



^ 1 0 ^ 


^01^ 


^ 1 0 ^ 


f {) 1 ^ 

' 


0 1 


1 0 


0 1 


1 0 



1 0 


1 0 


0 1 


0 1 



1° l > 


1° l > 


l 1 °) 


l 1 °) 



and IT = 


( l 
1 
0 
0 


is the matrix that indi¬ 


cates the blocking. Some examples of dot-notation follow: ITi. = (1 0) T is the assignment of unit u 
over blocks, IT .2 = (001 1) T is the assignment over units in block 2, etc. 


With multiple blocks, agents are allowed to play different actions across blocks. We would thus 
write An, for the action of agent i in block b, and A t i, for the action space of this action. 

With multiple blocks, there is also an additional block index for the potential and observed 
outcomes. For example, Y°^f is now the observed outcome of unit u assigned to block b and agent 
i] with dot-notation, y° bs denotes the observed outcomes of units in block b. The experiment 
design T> has now multiple score functions, </>&, one per block. For example, <fib(Yb° s ) is the score 
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of agent i in block b with data y£ bs . Similar extensions are straightforward for the concepts of 
performance, natural action, and quality. 

Given block-specific score functions, the winner of the experiment is the agent who won the 
majority of blocks, ignoring ties. When there is no interference across and within-blocks, then 
the experimenter can design an incentive-compatible design within each block using Theorem 13.11 
In this case, each block would have a separate identifying statistic. When the action space of an 
agent is the product space of the block action spaces, the agent will prefer to maximize its winning 
probability within each block. Therefore, the incentive-compatibility results of Theorems 13.11 and 
IQ can be readily applied. The same results can be applied in the problem with interference, 
assuming that there is no between-block interference, i.e., an action of agent i in block b does not 
affect the outcomes for agent j in some other block b'. 


B Proofs 


Theorem 13.11 Fix agent actions A, and consider design T> = that has an identifying statistic 

T with covariance matrix E(A). Let 4>i(Y ohs ) = f(Tf) for some function f : R —>• M, and let Vij( A) 
be the ijth element ofV(A) defined in Eq. (fl6l) . Also define, 

vJ(a|A_i) = vu(a, A_j) + Vjj(a,A_i) - %(a, A_*) - Vji(a, A_j). 

The design V is incentive-compatible, if, for every agent i, 


arg max 

Oii GAi 


/(x(qQ) 1 

vj(ailA-i) 1 / 2 j 


arg max (x(«i)} = A*, 

ai£Ai 


for every agent j, and all actions A _j. In such case, we say that T is aligned with performance x 
through score cf. 


Proof. For a vector x € M n , let f(x) = (f(x i), f(x 2 ), ..., f(x n )) J . From the Delta theorem [3J IS], 
and the asymptotic property (fl5l) of the identifying statistic T, we obtain 


Vk{f{T)-f{ X (A))) '#,^E(A)j;), (44) 

where 3<p is the Jacobian of / at x(A) (by definition, this is a diagonal matrix). The probability 
that agent i wins over j is equal to 

Pr (^(y obs ) > ^(T obs )) = Pr (c T /(T) > 0), (45) 

where c = (0,..., 1, 0,... , —1, 0,.. .) T , is a n x 1 vector, with zero elements, except for c,; = 1 and 
Cj = —1. Using Eq. (l44l) . we have 

Vk (c T /(T) - c J f (x{A))) A Af(0,3j^(A)jJc). (46) 

From (|46]h probability (]45l) becomes 


Pr (^(F obs ) > ^(T obs )) = 


( Mx( A)) - fj{x(A)) \ ^ ( x(A t ) - x(Aj) \ 

\ uj'(A)V2 J y V J (A) 1 / 2 J ’ 


where vj (A) is given in Eq. (1171) . Therefore, agent i maximizes its winning probability by playing 
the natural action, by property (fl8l) . □ 
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Theorem 14.11 Consider design V = (ip, cp) with an identifying statistic T with covariance matrix 
E(A). Suppose Assumption \4-l\ holds. If, for every agent i, 

^(T obs ) = f(Ti), where f : M -4 K, 

Var(</>j(y obs )) = const., 

arg max /(%(«*)) = arg max (x(a;)} = 

CX-idzAi CXi^zAi 

then design V is incentive-compatible. 

Proof. By Assumption 14.11 (no interference), S(A) is diagonal; let X(A) = diag(cr 2 (A)). Then, 
from Theorem (|4.1D and Condition (|23l) . 

Var (<MY obs )) = /'(X(A)) 2 4(A) = c, 

for some constant c > 0. Also by Condition (1231) . the Jacobian of <p at A, is given by J ^ = 
diag(/'(x(Aj))). Using the notation of Theorem 13.11 

U(A) = J^(K)Jl = diag(/(x(A i )) 2 4(A)) = cl. 

It follows, Vj(a\A_i) = 2c for any i,j, where v'j is defined in Eq. m, Theorem 13.11 Using 
Condition (l25l) . 

arg max [ * . } = (V 2 c) arg max {x(a*)} = A*. 

aiCAi [vj(ailA-i ) 1 / 2 J «*eA 

Thus, all conditions of Theorem 13.11 are fulfilled, and the design T> is incentive-compatible. □ 

Theorem 14.21 Consider an incentive-compatible design T> = (ip,<p), where action sets A% C M are 
compact, and performance x is one-to-one and continuous. Let, 

Vfc(^(U obs )-x(A)) ^Af(0,a 2 (A)), 

where function a 2 : A —>■ M + satisfies 

x(«'i) > x(«i) =>• cr 2 (a') > cr 2 (ai), 

for every agent i, and all actions a(,ai € A t . Consider a design V = (ip,<p'), where (p[(Y ohs ) = 
iy((pi(Y ohs )) , for each agent i, with is(-) defined by 

rv i 

v(y) = / — , = - dz. 

J \J °' 2 (x _1 (^)) 

Then, design V is incentive-compatible and more powerful than V, if n(-) is convex, or 1 /yf <r 2 (x _1 (')) 
and o- 2 (x _1 (-)) are both convex. 

Proof. From the univariate Delta theorem, 

Vk (^(<MF obs ) - Kx(A))) A A7(0,1), 
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since v' (x(Hj)) 2 a 2 (Ai) = 1, by Eq. (|3T|) . For brevity, set %(Aj) = f X.i and cr 2 (Hj) c = a 2 . Without 
loss of generality, assume Xi P Xj- The probability that agent i wins over agent j in design T>' is 
equal to, 

PiW) = * (v^TWXi) ~ *(Xj))) ■ 

In the old design, V. this probability is equal to 

Pi(A|P) = 4> | \'T X ‘ V; 



Case 1 - Convex z/(-). By convexity of v we have 


1 '(Xi) ~ v(Xj) ^ n x 

-— > v (xj)- 

Xi - Xj 


(47) 


By definition (l29l) . v'{Xj) a j = 1- By property ([30]). of > of since Xi > Xj- Hence, v'{Xi) 2(T i = 
1 =>■ v'(xi) 2 < u '{Xj) 2 ■ H follows, 

Axj) 2 o] + Axj) 2 o?> 2 ^ 


u \Xj) > 




(48) 


Combining (l47l) and (l48l) . we obtain 

v(Xi) ~ KXj) > Xi ~ Xj 


V2 


a? + a] 


4> 


(v'W-te) - KXj))) > $ ( \/fc 


Xi - Xj 


a? + a 2 


which implies that design T>' is more powerful than T>. 


Case 2 - 1/^(J 2 {x 1 (-)) an d & 2 (x 1 (0) are both convex. It holds, 

Kx») - Kx*) 1 f Xi 1 


zdz > 


1 


Xi - Xj Xi - Xj J Xj y/^ix-'W) V a2 (x 1 ((Xj + Xi)/ 2)) 

1 def / 2 


> 


V 0 ' 2 (x _1 (Xj))/2 + <7 2 (x -1 (Xi))/2 V°i +<T - 


o 


The first inequalty is obtained by convexity of 1 /\Jo 2 {x 1 (’)) ! and the second by convexity of 
o‘ 2 (x _1 (-))- To finish the proof we follow the same arguments as in Case 1. □ 


C Remarks on variance stabilization 

In Theorem 14.11 the variance of the score functions <fii is stabilized (made constant) through a 
transformation /. Such transformations that stabilize the variance of a statistic, are known as 
variance-stabilizing transformations in statistics, and they are of fundamental importance in various 
tasks, such as hypothesis testing and estimation. For example, consider a sample average of n 
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independent Poisson random variables with mean A. The asymptotic distribution of the sample 
average is Y ~ Poisson(A/n). In the limit, y/n(Y — A) jV"(0, A). This asymptotic result is not 
useful to construct a confidence interval for the unknown parameter A because the variance of Y 
depends on that unknown parameter. However, through the Delta theorem, 2 y/n(VY — y/X) —> 
jV(0,1 ) i.e., the variance of Vy is constant; the statistic Vy can be used to obtain exact confidence 
intervals for A. 

In our setting, the variance stabilization helps to mitigate the risk-return trade-off that strate¬ 
gic agents can undertake in an experiment. Loosely speaking, when the variance is stabilized a 
worse agent cannot benefit by being more risky, and a better agent cannot benefit by being more 
conservative. Rather, incentives are aligned such that every agent will do its best, assuming the 
remaining conditions of Theorem 14.11 are fulfilled. 

D Discussion 

Our approach to design incentive-compatible experiments has been through the use of an identifying 
statistic, i.e., a statistic that can estimate the agent performances without competition. In many 
situations, such a statistic exists, e.g., by using sample summaries (means, variances, etc), and then 
appealing to the central limit theorem. In most realistic cases, a key assumption will be that the 
outcomes have a known parametric form. In this paper, we made such parametric assumptions in 
our viral marketing example. 

However, an experimenter might be unwilling to make such parametric modeling assumptions. 
An alternative would then be either to use a nonparametric test for the quantities of interest (i.e., 
agent performances), or a randomization-based analysis. The former includes a wide-class of non¬ 
parametric methods, and we plan to investigate it in future work. It should be noted, however, 
that even nonparametric tests have crucial underlying assumptions, e.g., exchangeability of observed 
data, that are not easy to validate. In many situations, such assumptions are more critical than, 
for example, normality assumptions that can be quite robust under many scenarios [3J Appendix 
3A]. The latter method of randomization-based analysis usually starts from a null hypothesis which 
aims to provide evidence for the likelihood of certain observed quantities, e.g., through p-values. 
However, it is hard to test such hypotheses in our setting because agents can freely choose the 
versions of the treatment to apply. Therefore, one cannot use the null hypothesis to impute coun- 
terfactuals, i.e., outcomes that would have been observed under a different randomization because 
agents act in a strategic, non-random way. 

In the case with interference, the assumption that an identifying statistic exists has two compo¬ 
nents. First, it is required that the experimenter has a good idea about the model of interference, 
e.g., that an agent action affects the outcomes for another agent linearly, as in Example 3(c). As¬ 
sumptions on the model of interference are frequent in practice because they help to deal with in¬ 
terference after the experiment has been performed [ 2 ]. Second, it is required that the experimenter 
knows exactly the hyperparameters of the assumed interference model. In the viral marketing prob¬ 
lem of Section[5j a scalar parameter 7 was used to model interference. In our examples, we assumed 
that 7 was known. One way to avoid this problem is to treat such parameters of interference as 
nuisance parameters, and then use a suitable statistical method; e.g., use profile likelihood instead 
of the true, but unknown, likelihood to obtain proxies for the maximum-likelihood estimates. A 
Bayesian approach would be to set priors for such parameters and then obtain a posterior predictive 
distribution for the unknown agent performances. Agents would then be scored according to this 
posterior distribution, but this would not alter the core of our methodology. 
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